-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add BOLT Makefile #54107
Add BOLT Makefile #54107
Conversation
The corecompiler.ji build seems representative so I'll use that as an overall benchmark:
Here's a litany of benchmarks:
With BOLT:
With PGO+LTO:
With PGO+LTO+BOLT
Sysimage and pkgimage building ( On 4a2c593 (did not build LLVM from source):
With BOLT:
With PGO+LTO:
With PGO+LTO+BOLT:
On 4a2c593 (did not build LLVM from source): 5.45s For this string benchmark, the builds with PGO were sufficiently noisy that I think that all the builds with optimized were as fast as each other. |
Profiling the sysimg and pkgimg build worked.
Now seems to take 4.6s, I'm very sceptical of this result though. I was concerned that as |
Here's the updated benchmark for BOLT, now profiling the sysimg build:
The string benchmark takes 4.85s |
So the new corecompiler.ji results: |
Oh this looks very cool, thanks for working on this @Zentrik! |
Just a side note but using the tests of LoopVectorization seems non-ideal since it is deprecated, has cases where it asserts or segfaults, and is quite special e.g. I'm that it heavily uses llvmcall. |
Thank you @Zentrik for working on this. |
in your terminal. Then you can find the optimized binary in the contrib/pgo-lto-bolt/optimized.build directory. Alternatively, you can download binaries for a PGO+LTO build for x86_64 linux gnu here. |
@Zentrik do you plan to make any other changes? Is this ready for review? |
This is ready |
This seems to break x86_64 windows build |
Seems to prevent rebuilding pkgimages which is necessary as o/w segfaults. Maybe segfault is due to optimizing sys.so?
Relocations bloat binary size, it doesn't get loaded into memory but e.g. libjulia-codegen gains ~60mb. Ofc can be stripped but that's annoying.
TODO: remove BOLTLDFLAGS and BOLTCXXFLAGS, we can just add them straight to llvm's flags once I copy over clang detection.
BaseBenchmarks.SUITE["inference"]["allinference"]["Base.init_stdio(::Ptr{Cvoid})"] takes about 3.1s without optimizing libjulia-internal.so and on master. With optimized libjulia-internal.so, it takes about 2.9s. I suspect InferenceBaseBenchmarks don't spend much time in LLVM so optimizing it doesn't help much. JULIA_LLVM_ARGS=-time-passes julia script-45395.jl spends about 4.3s in LLVM passes with optimized libLLVM (optimizing libjulia-internal has no effect) whilst taking about 4.9s on master. Total time (~7.9s and 8.7s on master) isn't affected by the optimization of libjulia-internal, which makes sense as we spend most time in LLVM passes. This reverts commit 0bb57ec.
This should hopefully be unnecessary now that we don't optimize sys.so.
…lia-codegen This prevents it from rebuilding other stuff like libjulia-codegen which depends on libjulia-internal and so libjulia-codegen loses it's instrumentation.
…et BOLTed Now that we pass `-fno-reorder-blocks-and-partition` to libjulia-codegen it seems BOLTing libjulia-internal no longer segfaults.
For one thing BOLT found split functions in libjulia-internal and codegen which could cause problems. Also it seemed to remove BOLT's performance improvement, given `jl_type_infer` was split maybe BOLT skipped it and other important functions.
I didn't try to reuse the bolt or pgo-lto Makefiles by including them as that seemed difficult and would probably break on any non-trivial change to either.
Was triggering in ssair and codegen tests, also doing `./julia s` where `s` is not a file.
[skip-ci]
BOLT only supports ELF binaries. [no-ci]
Seems to have resolved itself. |
Probably good with a NEWS entry? |
Even if it is only for advanced users it is in my opinion good to have some kind of external reference to it (NEWS + (dev)docs). Like, if I want to test this now I don't really know where to start. |
Ref: #54107 (comment). If accepted, I'll add the NEWS.md entry for PGO/LTO in the release-1.11 branch too.
This uses LLVM's BOLT to optimize libLLVM, libjulia-internal and libjulia-codegen. This improves the allinference benchmarks by about 10% largely due to the optimization of libjulia-internal. The example in issue JuliaLang#45395 which stresses LLVM significantly more also sees a ~10% improvement. We see a 20% improvement on ```julia @time for i in 1:100000000 string(i) end ``` When building corecompiler.ji: BOLT gives about a 16% improvement PGO+LTO gives about a 21% improvement PGO+LTO+BOLT gives about a 23% improvement This only requires a single build of LLVM and theoretically none if we change the binary builder script (i.e. we build with relocations and the `-fno-reorder-blocks-and-partition` and then we can use BOLT to get binaries with no relocations and reordered blocks and then ship both binaries?) compared to the 2 in PGO. Also, this theoretically can improve performance of a PGO+LTO build by a couple %. The only reproducible test problem I see is that the BOLT, PGO+LTO and PGO+LTO+BOLT builds all cause `readelf` to emit warnings as part of the `osutils` tests. ``` readelf: Warning: Unrecognised form: 0x22 readelf: Warning: DIE has locviews without loclist readelf: Warning: Unrecognised form: 0x23 readelf: Warning: DIE at offset 0x227399 refers to abbreviation number 14754 which does not exist readelf: Warning: Bogus end-of-siblings marker detected at offset 212aa9 in .debug_info section readelf: Warning: Bogus end-of-siblings marker detected at offset 212ab0 in .debug_info section readelf: Warning: Further warnings about bogus end-of-sibling markers suppressed ``` The unrecognised form warnings seem to be a bug in binutils, https://sourceware.org/bugzilla/show_bug.cgi?id=28981. `DIE at offset` warning I believe was fixed in binutils 2.36, https://sourceware.org/bugzilla/show_bug.cgi?id=26808, but `ld -v` says I have 2.38. I assume these are all benign. I also don't see them on CI here https://buildkite.com/julialang/julia-buildkite/builds/1507#018f00e7-0737-4a42-bcd9-d4061dc8c93e so could just be a local issue.
Ref: JuliaLang#54107 (comment). If accepted, I'll add the NEWS.md entry for PGO/LTO in the release-1.11 branch too.
This uses LLVM's BOLT to optimize libLLVM, libjulia-internal and libjulia-codegen.
This improves the allinference benchmarks by about 10% largely due to the optimization of libjulia-internal.
The example in issue #45395 which stresses LLVM significantly more also sees a ~10% improvement.
We see a 20% improvement on
When building corecompiler.ji:
BOLT gives about a 16% improvement
PGO+LTO gives about a 21% improvement
PGO+LTO+BOLT gives about a 23% improvement
This only requires a single build of LLVM and theoretically none if we change the binary builder script (i.e. we build with relocations and the
-fno-reorder-blocks-and-partition
and then we can use BOLT to get binaries with no relocations and reordered blocks and then ship both binaries?) compared to the 2 in PGO. Also, this theoretically can improve performance of a PGO+LTO build by a couple %.The only reproducible test problem I see is that the BOLT, PGO+LTO and PGO+LTO+BOLT builds all cause
readelf
to emit warnings as part of theosutils
tests.The unrecognised form warnings seem to be a bug in binutils, https://sourceware.org/bugzilla/show_bug.cgi?id=28981.
DIE at offset
warning I believe was fixed in binutils 2.36, https://sourceware.org/bugzilla/show_bug.cgi?id=26808, butld -v
says I have 2.38.I assume these are all benign. I also don't see them on CI here https://buildkite.com/julialang/julia-buildkite/builds/1507#018f00e7-0737-4a42-bcd9-d4061dc8c93e so could just be a local issue.
TODO:
-fno-reorder-blocks-and-partition
as we don't run BOLT on all binaries.-fno-reorder-blocks-and-partition
when using clang.warning: address range table at offset 0x0 has a premature terminator entry at offset 0x10
, filed [BOLT] Optimized binary has premature terminator entry warning llvm/llvm-project#89508.